进化树构建 -- NJ法 -- rapidnj

Cite

一.简介

rapidnj是一个高效的NJ树构建工具。它使用了一些优化技巧,使其比传统的NJ方法更快。

rapidnj可以处理非常大的距离矩阵,这对于处理大规模的遗传数据或比较大量的序列非常有用。同时,支持bootstrap重采样,这是一种评估树分支可靠性的方法。通过多次重采样数据并构建多个树,可以为原始树的每个分支提供一个可靠性评分。软件支持多种常见的距离矩阵格式,如PHYLIP和FASTA等。这使得它可以与其他生物信息学工具轻松集成。

二.安装

下载软件rapidnj压缩包

wget https://github.com/somme89/rapidNJ/archive/refs/tags/latest.zip

解压缩

unzip latest.zip

编译和安装

cd rapidNJ-latest
make

给予可执行文件执行权限:

chmod +x rapidNJ-latest/bin/rapidnj

三.使用

3.1 程序参数

/share/nas1/yuj/software/rapidNJ_2.3.3/bin/rapidnj -h

Rapid neighbour-joining. An implementation of the canonical neighbour-joining method which utilize a fast search heuristic to reduce the running time. RapidNJ can be used to reconstruct large trees using a very small amount of memory by utilizing the HDD as storage.

USAGE: rapidnj INPUT [OPTIONS]
The INPUT can be a distance matrix in phylip (.phylip) format or a multiple alignment in stockholm (.sth) or phylip format (.phylip).
OPTIONS:
  -h, --help                display this help message and exit.
  -v, --verbose             turn on verbose output.
  -i, --input-format ARG    Specifies the type of input. pd = distance
                            matrix in phylip format, sth = multiple alignment in (single line) stockholm format.
                            fa = multiple alignment in (single line) FASTA format.
  -o, --output-format ARG   Specifies the type of output. t = phylogenetic tree in newick format
                            (default), m = distance matrix.
  -a, --evolution-model ARG Specifies which sequence evolution method to use when computing
                            distance estimates from multiple alignments. jc = juke cantor,
                            kim = Kimura's distance (default).
  -m, --memory-size         The maximum amount of memory which rapidNJ is allowed to use (in MB).
                            Default is 90% of all available memory.
  -k, --rapidnj-mem ARG     Force RapidNJ to use a memory efficient version of rapidNJ. The 'arg'
                            specifies the percentage of a sorted distance matrix which should be
                            stored in memory (arg=10 means 10%).
  -d, --rapidnj-disk ARG    Force RapidNJ to use HDD caching where 'arg' is the directory used to
                            store cached files.
  -c, --cores ARG           Number of cores to use for computating distance matrices from multiple
                            alignments. All available cores are used by default.
  -b  --bootstrap ARG       Compute bootstrap values using ARG samples. The output tree will be
                            annotated with the bootstrap values.
  -t, --alignment-type ARG  Force the input alignment to be treated as: p = protein alignment,
                            d = DNA alignment.
  -n  --no-negative-length  Adjust for negative branch lengths.
  -x  --output-file ARG     Output the result to this file instead of stdout.

3.2 运行

/share/nas1/yuj/software/rapidNJ_2.3.3/bin/rapidnj -i fa input.fa -b 1000 > phytree.nwk

等到100%即可